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considered and related to other psychometric concepts. (Author/NE) 
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THE STRATIFIED ADAPTIVE COMPUTERIZED 
ABILITY TEST 



Since the devei opmen t of the first group abili ty 
test over a half-century ago, paper and pencil tests 
have dominated ability testing. Paper and pencil tests, 
which represent one strategy of measuring human abilities, 
consist of a limited number of test items organized in a 
specified manner which are presented to all testees in 
the same way. Testees proceed through the test items in 
approximately the order in which they are printed in the 
test booklet , The paper and pencil test is thus a highly 
standardized testing strategy which was developed to per- 
mit one administrator to test large numbers of testees 
simultaneously. However, the group paper and pencil test 
has a number of deficiencies (Weiss & Betz, 1973) which 
make it desirable to investigate other strategies of ad- 
ministering ability tests. 

The availability of time-shared computer systems now 
makes it possible to implement a variety of new strategies 
for measuring abilities. Interactive computer systems, in 
which the testee can be presented with test items by the 
computer and respond to them on a typewriter keyboard, or 
by means of a light-pen, permit the psy chometrician to 
develop ways of adapting, or tailoring, test items to each 
individual's estimated ability level. This is accomplished 
as a result of the computer's capacity to receive the 
testee 's response to a test item, evaluate that response, 
consult a pre-de t ermined set of rules to determine the 
next item to be administered , and to administer the chosen 
next item. In a time -shared computer system, one computer 
can administer such adaptive ability tests essentially 
simultaneously to a large number of testees. 

In adaptive testing it is the "pre -determined set of 
rules" governing the choice of the next test item to be 
administered that differentiate the various strategies of 
computerized ability testing. In paper and pencil testing 
each item is administered in succession whether a testee 
cinswers an item correctly or incorrectly. In adaptive 
testing, choice of the next item to be administered is 
conlingerf upon whether the testee's response to a pre- 
vious item, or a set of previous items, was correct oi' 
incorrect. A number of different strategies, or decision 
rules for choice of subsequent test items, have been pro- 
posed to implement adaptive testing (Weiss & Bef:z, 1973). 
Among these are two-stage, pyramidal, flexilevel, Bayesian 
and maximum likelihood approaches for tailoring or adapting 
a test to individual differences among testees. 



While each of these available adaptive testing stra- 
tegies has its advantages and unique characteristics 
(Weiss, 1973) J logical considerations suggest that addi- 
tional ways of moving a testee through an item pool might 
be desirable , This paper proposes one such new method, 
describes its rationale, and presejit s some examples based 
on actual computerized testing. 

"Peaked" Ability Tests 

A peaked ability test is one in which all test items 
are very similar in difficulty. In the extreme case of 
peakedness, an ability test would have all items of the 
same level of difficulty. Thus, item difficulty would 
have no variance. Since this ideal condition is rather 
difficult to achieve in practice, operational peaked abil- 
ity tests tend to have very low variances of their item 
difficulties, reflecting a set of test items distributed 
over a very narrow range of difficulty. The smaller the 
item difficulty variance, the greater the peakedness, 
Wheij the range of the distribution of item difficulties 
in a test approaches the range of ability measured by 
that test, and there are an equal number of items at each 
level of difficulty, the distribution of item difficulties 
is said to be rectangular. Most commercial ability tests 
have distributions of item difficulties which lie between 
the extremes of the completely peaked test and the rec- 
tangularly distributed ability test. These tests tend to 
have item distributions which are appr oxima t e ly n ormal ly 
distributed across the ability continuum. 

In a series of theoretical papers comparing completely 
peaked ability tests (i.e., tests composed of items of 
equal difficulty) with tests "administered" under a variety 
of adaptive testing strategies. Lord (197O; 1971a, b,c) 
reached one consistent conclusion: in terms of the pre- 
cision of * measurement , or the capability of responses to 
a set of test items to reproduce accurately the "true 
ability" of hypothetical testees, the peaked test always 
provided more precise measurement than an adaptive test 
of the same length when the testee 's ability was at the 
point at which the test was peaked . As the testae's 
ability deviated from the point at which the test was 
peaked, the measurement efficiency (i.e., the number of 
test items required to achieve a given degree of preci- 
sion) of the peaked test diminished more rapidly than that 
of the adaptive tests. Figure 1 illustrates Lord's general 
finding in this series of studies. As Figure 1 shows, at 
some point on the ability continuum, usually plus or minus 
.50 to 1,0 standard deviations, the efficiency of the 
adaptive test becomes higher than that of the peaked test. 



ADARTIVE TEST I 




Average 

ABILITY (in z-score units) 



Figure 1. Efficiency of measurement as a function 

of ability level (after Lord, 1970; 1971a, b,c) 



With increasing distance from the peaked point, the adap- 
tive tests become more and more efficient in comparison 
to the peaked test. However, Lord's theoretical results 
(lid show that peaked tests can provide greater measure- 
ment efficiency than all adaptive tests studied thus far 
for up to about 70^ of a population normally distributed 
iiround the peaked point of the test. 

While Lord's theoretical analyses reflect an ideal 
set of conditions (i.e., all test items are of equal 
difficulty and equal discrimination), they are important 
enough not to be easily dismissed. Interpreted in another 
way. Lord's findings indicate that peaked tests provide 
most accurate measurement when the ability of the indi- 
vidual being measured is exactly equal to the difficulty 
level at which the test is peaked. His analysis is supple- 
mented by the findings of information theory (e.g.. Hick, 
1951) which indicate that test items provide most infor- 
mation when the probability of a correct answer to a 
given test item is .50 for any individual. Thus, a test 
comprised of all items of .50 difficulty for an indivi- 
dual would provide the most information about that indi- 
vidual's true ability level, and in Lord's terms, the 

most precise test score for him. ^ 

The important aspect of these findings from both test 
theory and information theory is that the test must be 
peaked at the individual's ability level for measurement 
to be most accurate. But ability level is not known in 
advance; it j.s the test's function to measure . ability 
level. The typical solution to this problem is to peak 
tests at the estimated ability level of some group of 
testees. Thus, a test designed to measure the abilities 
of college freshmen is peaked at the average ability 
level for college freshmen. Since testees always vary 
in ability, however, the precision of measurement of any 
individual's ability estimate derived from a peaked test 
will depend on the distance of his ability from the esti- 
mated mean ability of the group, as shown in Figure 1. 
Thus , the individual whose ability is at the group mean 
will have a test score of maximum precision. But indi- 
viduals whose ability deviates from that mean will obtain 
ability estimates which are less precise, with precision 
decreasing v:ith increasing distance from the mean. For 

individuals below the estimated mean ability level of the » 

group, the test items will be too difficult. For these 

testees the probability of correctly answering the items 

will be less than #50; the items thus will provide less 

information on their true ability level. For individuals 

above the estimated mean ability level, the items will be 



too easy. Thus, their probability of a correct response 
will be greater than • 50 and again, the test items will 
provide less information about the ability levels of 
those testees, 

Foilowin^j the administration of a peciked test, it is 
possible to tell if the test was appropriate for any 
given individual . If the test is peaked with items of 
average difficulty for a group of subjects, the diffi- 
culties of the items will be p = .50? i.e., half the 
group will have answered each item correctly. The appro- 
priateness of that peaked test for any individual can be 
determined by the proportion of total items taken that 
he/she has answered correctly. A peaked test can be 
thought of as being most appropriate for an individual 
if he gets about half the items correct. Under these 
circumstances each item provides maximum information on 
that testee and his score has maximum precision. If an 
individual answers none of the items on a test correctly 
(or, if guessing is possible, operates at a chance level) 
or answers most or all the items in the test correctly, 
the test was inappropriate for that individual (Lord, 
1971c). However, under conventional ability test admini- 
stration procedures (i.e., paper and pencil tests), the 
appropriateness or inappropriat eness of a test for any 
given individual can not be determined until after the 
test has been administered. For many uses of test in- 
formation, such post hoc determination of appropriateness 
is too late; the obtained ability estimates may have 
associated with them very large errors which seriously 
reduce their utility in practical situations and frequently 
result in invalid uses of such test scores for practical 
decisions . 

Binet's Testing Strategy 

Recognition that a single peaked test me:\ not be 
appropriate for a given testee seerrs to have been im- 
plicit in Binet*s early work in .individual testing. That 
work resulted in the Stanford-Bine^ Scales (Tciman and 
Merrill, I960), which are still ac]vnowledged by many as 
the "standard" of ability measurement. Binet ' s appi^oach 
to ability measurement, rather thai depending on a single 
test peaked at the average ability level of the children 
whose ability it was measuring, used a series of ti^sts 
organized around the concept of "mental age." Test items 
at each of the "mental age** levelir w ^re peaked aroinid a 
given mental age, and there was 11 tt le overlap betvveri 
mental ages. Items were included xn a peakerl "mental age" 
test if about 50^ of the norm group of that chronological 
age gave correct answers to those items. Irj other wi^rrls. 
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the items in the test labelled ''mental age 8,0", for 
example , would be those items answered correctly by 
approximately 50^ of those aged exactly 8.0 years who 
were part of the norm ^roup. A similar rationale was used 
to construct the tests peaked at each other "mental age" 
comprising the Binet test. The Stani ord-Bine t can thus 
be characterized not as one test but as a series of tests, 
each peaked at a given mental age and providing most 
accurate measurement for individuals at that mental age. 

Binet ' s test administration procedure implicitly 
recognizes that peaked tests which do not permit the 
testee to obtain about half correct and half incorrect 
answers provide little information about liis ability and 
therefore should not be administered to him. In adminis- 
tering the Stanf ord-Binet , the administrator estimates an 
"entry point" into the hierarchy of mental age peaked 
tests. The usual entry point consists of that mental age 
closest to the testee's chronological age; thus, the testee 
whose chronological age is 8 years, i month, will likely 
start with the test peaked at the 8.0 year level. The 
administrator is allowed flexibility, however. If it is 
hypothesized on the basis of prior information that the 
child is "bright" for his age, the 8 year 1 month child 
might be started at the 9»0 mental age test; conversely, 
the child who is expected to be "less bright" might be 
started at the test peaked at age 6.5« 

Following determination of the "entry point" on the 
scaled peaked tests, the administrator administers the 
items of the entry-point peaked test and then moves to 
tests of lesser difficulty. Items are scored as test 
administration proceeds, with the administrator searching 
first for the testee's "basal age" and then for his "ceil- 
ing age." Binet 's basal age is the peaked test at which 
the individual answers all test items correctly. These 
data provide no inf orr^at ion on an individual's ability 
except that it is like.^y not to be lower than that mental 
age. Thus, it is assumed that if the testee were ad- 
ministered items from tests peaked at mental ages below 
the obtained basal age, he would provide correct answers 
•to all of those items. If this assumption is correct, 
those items also will provide no information on the testee's 
ability level (they would all be too easy), thus nothing 
would be gained by administering them. The "basal age" 
therefore defines a "floor" below which further ability 
testing is unfruitful. 

Similarly, the "ceiling age" provides an upper limit 
beyond which further testing is unnecessary and, in terms 
of testee motivation (e.g., frustration), might even reduce 



the accuracy oV the test score. The ''ceilirijfj age*' iden- 
tifies the peaked test at which the testee obtains all 
incorrect answ€?rs. Like the basal age test, in terms of 
information theory the test responses provide no infor- 
mation. The ceiling age simply intJicates thcit the indi- 
vifhjal's ability is somewhere; below that level ♦ but it does 
not indicate where on the ability continuum the indivi- 
dual is likely to be located. It is also assumed that all 
peaked tests above the ceiling age will likely produce 
the same results as the ceiling age test, i . . , all re- 
sponses would be incorrect, and therefore the tests would 
provide no information on the testee's ability level. 

Once the adm;. nis t rat or has de i^ermined a t es t ee ' s 
basal age, testing proceeds through tests of higher 
difficulty until the ceiling age is identified. It is the 
peaked tests wit))in the limits defined by the basal aj)d 
cei ling ages that wil i I i ko i y provide meaningful infor- 
mation on a testee's ability level. The totality of test 
items between any t<;stoe's basal and ceiling ages will 
provide accurate measurement, for that individual ; for 
another testee with d i f f er en t basal and/ or ceiling levels 
a different sot ol" test items will provide maximum infor- 
mation on his ability level. If the test is properly 
unidimensional for a given individual, and administration 
conditions are optimal, the proportion correct at each 
mental age level from the basal age through the ceiling 
age should show ti regular decrease. If there were a very 
large number of mental age peaked tests between the basal 
and ceiling ages, proportion correct on these tests would 
vary from 1.00 at the basal age, through a test on which 
the individual answers approximately .50 of the items 
correctly, to ,00 correct at the ceiling age. It will 
be noted that the area between the basal and ceiling ages 
includes a peaked test (at least theoretically) of maximum 
measurement efficiency, i.e., a peaked test on which the 
individual answers 50?*^* of" t^^^e items correctly. 

Assuming that the item pool is relevant Tor each in- 
dividual (i.e., they are from the culture on which the test 
was normed ) and that it is unidimensional for each teste o, 
the Stanf ord-Bine t is the only test which has this charac- 
t eris t ic--measurement of any individual '3 ability is con- 
fined to that area of the ability continuum which pro^- 
vides, over all test items administered , maxinjum average 
information per test item. The Stanf ord-Bine t should, 
thereTore , provide scores of more nearly constant pre- 
cision of measurement than tests which do not have this 
adaptive feature-- the capabili ty of ''searching out " the 
individual ' s abili ty leve 1 among a series of scaled peaked 
tests. Perhaps it is this characteristic of the Binet 
tests which has made them the standard of comparison for 
other ability tests. 

O 

ERLC 
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Thus, by adapting selection and administration of 
peaked tests to the indxvidual being measured, Binet's 
concept of ability testing seems to anticipate Lord's 
later theoretical findings concerning the efficiency 
of peaked tests. The individual administration of the 
Binet tests, however, introduces other sources of score 
variance which attribute error to the measurements ob- 
tained (Weiss & 3etz, 19*73) • IrJ addition to the unrelia- 
bility due to scoring, administrator effects such as sex 
and race and other characteristics of the administrator 
and surrounding conditions serve to offset the increases 
in precision of measurement gained Trom the adaptive 
strategy of test administration. 

With the current availability of time-shared compu- 
ters for use as test administration devices, it is now 
possible to minimize the effects of the administrator 
variables which affect test scores, and at the same time 
utilize Binet 's insights, with some improvements, in the 
ability measurement process. The stratified adaptive 
(STRADAPTIVE) computerized test is proposed as a means 
of obtaining ability test scores with nearly constant 
precision across a wide-ranging group of testees, building 
on the logic of Binet 's test administration procedure 
and implementing Lord's theoretical findings and those 
available from information theory.-^ 

The STRADAPTIVE Test 

The stradaptive test, lite Binet 's testing strategy, 
operates from a pool of items stratified by difficulty 
level, or organized into a set of scaled peaked tests. 
Each testee begins at a difficulty level estimated to 
correspond to his ability level, also following Binet 's 
strategy. By using any of a number of branching pro- 
cedures, the stradaptive test moves the testee through 
items of varying levels of difficul'*:y in search of a 
region of the item pool which will provide maximum in- 
formation about his ability level. The branching process 
leads to the identification of a "basal 3tratum" and a 
"ceiling stratum". Testing can be terminated when the 
ceiling stratum is reached. Each of these characteristics 
of stradaptive testing is considered below in detail. 



The term "stradaptive" is used rather than "stratified" to 
differentiate this approach from Cronbach's (Cronbach, 
Gleser, Nanda & Rajaratnam, 1972) conception of stratified 
tests, which are based on the idea of sampling test items 
from a sti*atified universe in which test items are classi- 
fied by content, task, or difficulty. 

ERIC 



Item Pool Structure 



The stradaptive test r aires an item pool stratified 
by the difficulty levels o o constituent test items. A 
stratified item pool is one ii which items are organized 
into a series of tests peaked at different difficulty 
levels. The pool should be known or assumed to be unidi- 
mensional. It will be shown below, however, that unidi- 
mensionality of the pool might not be evident f'^r some 
testees; but the pool should be unidimensional for most 
testees in order to provide the most constant precision 
of measurement. The sreps in developing an item pool for 
a stradaptive test include the following: 

1. Administer a large number of items measuring the 
same ability to a large group of subjects. The 
subjects should be representative of the wide- 
ranging population for which the stradaptive 
test is intended. The size of the original item 
pool will depend on the quality of the items 
used and the target size of the final stratified 
item pool. While the optimal size of the stra- 
daptive item pool is yet to be determined, ade- 
quate I'esults have been obtained with about 200 
items in the final pool. Likewise, no informa- 
tion is as yet available on the required number 
of subjects in the norming item pool. Naturally, 
a larger norming group will result in more stable 
item parameter estimates. 

2. Derive item discrimination and item difficulty 
estimates for the items administered to the 
norming group. These parameters can be either 
traditional item par-ameters (proportion correct, 
item-total score correlations) or parameters 
derived from modern test theory using normal 
ogive item assumptions or logistic item functions 
(Lord & Novick, 196Q) . Items with very low dis- 
criminations should h'j eliminated. 

3. Organize the item pool into a number of indepen- 
dent strata by difficulty level, where each stra- 
tum is in effect, a peaked tt?st of some number of 
items. There should be no overlap in item diffi- 
culties between the stra'* . The nunber of strata 
developed from an item pool, or the number of 
peaked tests available, depends on the size of 
the ,. original item pool. The larger the number of 
strata the more likely the obtained ability tests 
will have equal precision across a group of testees 

^6f wide-ranging ability, since the peaked tests 
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Figure 2. Distribution of items, by difficulty level, 
in a S trad apt ive Test 



are more likely to excictly in^tch each t;(:s tee's 
abilitx level. A minimum of nine or ten strata 
seems to be appropriate, since that member o£ 
strata seems to prov.ide a c^ood ran^e of covera^^e 
of abil ities without requirin/j very lnr^(^ i tern 
pools . The question is , of course , open for 
considerable furthior j nvos tigat i on . 

The number of items at each stratum will vary 
with both the size of the original item pool, 
and wx tlx the numb or of s t t\i ta to be d ev cl oped . 
A minimum of ten to fifteen items at any given 
s t rat urn appears to be appropriat e . There need 
not be an equal n u inV) er of items at the vari(Kis 
stJNHtn; pxperiencf stiggesls that the middle and 
Lower difficulty sti^ata might require moTM^ items 
than those at the upper extremt^s. 

'l . The items within each sli^atuin should bo arT aiik'^od 
in decreasing ordf^i^ of item d Iscr i minn tion ^ jf 
item discr j mination indices we re der ived f t orn 
analyses on the total norming ^;roup, as differ- 
entia t(Mj from ind i ces coinputed on sub-groups 
bfised on ability Levels. Sinrr at the 
stages of testing (i.e., the first f(»w items :jr. 
each stratum) items must discriminate ricv^oss a 
wider range of abilities, item d j.scriminations 
based on a group of wide-ranging ability will be 
more appropriate. On tlie other hand, at the 
later stages of testing wlien testing is confineri 
to only a narrow range of abilities (i.e., within 
2 or 3 of the available strata), items need not 
be able to discrj minai on a group of wide-rang(^ 
ability. Rather, item discriminatd ons should bo 
based on discrimination indices der j vcd from 
closely contiguous levels of ability. Thus, itoin 
with re Latively 1 owr disc iM.mlnat 1 on i nd.i. c<^3 on th<} 
total group might be capable of d:i sci^imj nat I n{\ 
between contiguous strata at the later s I. a g e s o£ 
testing ( Pat erson , I 9^2 ; Br y son , 1971). 

The result of this process of structuring the item 
pool is shown diagrammatically in Figure 2. The hypothe- 
tical stradaptive item pool shown in Figure 2 contains 
nine strata. Each stratum consists of a subset of j.teins 
peaked around a different difficuLty level , with the diJT- 
culty level increasing with eacli successive s tra Mini. Thus 
stratum 1 consists of a sub-set of very easy items distrj- 
buted approximately normally around a difficul ty level of 
p = .9^, with items varying in difficulty from p = .99 to 
p = .89; stratum 1, therefore, represents a very easy 
peaked. test. Stratum 2 consists of a set of items peaked 



-12- 



at a difficulty level slightly higher than those of stra- 
tum 1; stratum 2 items are peaked at about p = ,83 and 
vary from p = .88 to p = .78. Stratum 9 is a difficult 
test with items varying in difficulty from p = .01 to 
p = .11 and peaked at p = .06. Note that the item dis- 
tributions in Figure 2 do not overlap between strata. 

Table 1 shows an operational stradaptive item pool. 
The pool consists of 229 items grouped into 9 difficulty 
strata. The number of items at each stratum varies from 
10 at stratum 9 (the most difficult peaked test) to 36 at 
strH'-Ca 2 and 3* Items were selected from a larger pool of 
about 500 items on which normal ogive transformations of 
item discriminations (a) and difficulties (b) had been 
previously computed using estimates of Lord's (Lord & 
Novick, 1968) normal ogive item parameters. To construct 
the item pool, the range of item difficulties from +3*00 
standard deviations to -3*00 standard deviations was 
divided into 9 equal parts. All items from the larger 
pool were included in the stradaptive item pool if their 
normal ogive discrimination parameters were a = .3^ or 
above (with the exception of the tenth item at stratum 9 
which was included to increase the number of itoms at that 
stratum to lO).^ 

The 9 strata in Table 1 are essentially nine peaked 
tests varying in average difficulty from -2.65 to +2»62. 
The most difficult peaked test (stratum 9) is composed of 
10 items peaked at b = 2.62, varying from the most diffi- 
cult item at b = 3*11 to the easiest item in that stratum 
at b = 2.32. Stratum 8 is a slightly less difficult peaked 
test with average b = 2.01 and with the 15 items varying 
in difficulties from b = 2.31. to b = I.65. Within each 
stratum items are ordered by discrimination; for stratum 
9 the first item has a discrimination of a = .84, and 
the last item at that stratum has a discrimination of 
a = .21. Similar patterns are obvious for the other 
strata. The greater number of items at the middle and 
lower level of difficulties reflects the composition of 
the original item pool from which these items were selected. 
However, in actual testing with the stradaptive test it 
has become evident that successful testing for many sub- 
jects requires the availability of a larger pool of items 
at the middle and lower ranges of difficulty. 

Operationalizing the Stradaptive Test 

Entry point . The stradaptive test permits the use of 
differential entry points for beginning testing for differ- 
ent individuals. While it is not necessary to use 

2 

A further exception is item 19 at stratum 4, which has a 
discrimination of .27; that item was included in the pool 
by error. 
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different ial entries, i.e., all testee^ can begin with the 
same test item, the differential entry point has at least 
two major advantages. First, beginning testing at different 
strata for different individuals might save time in testing 
in terms of the number of items administered to a given in- 
dividual. Thus, if it is known or suspected that a given 
testee is likely to be high on the ability to be measured, 
say 1.5 standard deviations above the mean, it would be 
wasteful of the testee' s time to begin testing with an 
item of average difficulty. Use of a differential entry 
point for this individual might save time by eliminating 
the administration of three or four unnecessary items. 
The time saving would increase as the individual's estimated 
ability deviated from an arbitrary fixed entry point. 

The second major advantage of using a differential 
entry point for beginning testing involves the testee *s 
motivation to continue testing or to do well. Beginning 
an individual of low ability at an item of median diffi- 
culty will almost insure that the first several items 
taken will be too difficult for him; a frustration or 
anxiety reaction might occur which could adversely affect 
his performance on the remainder of the test items. Con- 
versely, administering items of median difficulty to an 
individual of high ability might cause a boredom or "irrel- 
evance" reaction which could then affect his performance 
on the entire test. 

It thus appears to be desirable to begin the stradap- 
tive test at some point estimated to be approximately re- 
presentative of the individual's ability level on the trait 
being measured. Two sources of entry point estimates are 
possible. First, the computer could have stored informal- 
tion on an individual which might be useful as entry point 
information. For example, if the stradaptive test is being 
used to measure verbal ability, such information as scores 
on other verbal ability tests, grades in English courses, 
grade point average , or simply number of years of formal 
schooling completed could be stored in the computer. Once 
the testee identifies himself to the computer by name or 
identification number , the computer would retrieve the 
appropriate information from his file and, based on known 
or estimated relationships between the prior information 
and test performance , determine the entry point on the 
ability continuum for that testee. 

The testee himself is a second important source of 
entry point information. Rather than consulting actual 
records on the testee, it might be fruitful to ask testees 
for the information necessary to derive entry points . 
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Figure 3 shows two such entry point questions currently 
in use for s trad apt ive t est ing of verbal ability. The 
top half of Figure 3 is an entry point question for use 
with college students. In constructing the entry point 
estimate it was assumed that college grade point average 
(GPA) had a roughly positive and linear relationship with 
verbal ability. Individuals who answer in the first cate- 
gory, 3*76 to 4.00, enter the stradaptive test at stratum 
9; individuals who indicate that their GPA*s are between 
2. 51 and 2,75 enter the stradaptive test at stratum k . 

The bottom hailf of Figure 3 shows a different entry 
point quest ion asked of the t est ee . This entry point 
information was developed for use. with a group of inner- 
city high school students who could not be assumed to know 
their GPA and might also prove to be useful in a non- 
school testing situation. It is based on the assumption 
that the testee has a fairly good knowledge of his level 
of ability in comparison to his peers. Whether or not 
the testee can make a good estimate of his ability can be 
determined by the results of the stradaptive testing. 
The only effect of a poor estimate of a testee's entry 
point is that he will be administered a few more test 
items than would otherwise be necessary to measure his 
ability adequately. In any case, the stradaptive test is 
designed to converge upon the testee *s level of ability 
regardless of the adequacy of the entry point. Thus, 
entry point information need only be very roughly related 
to the ability being measured. 

Branching . The stradaptive test permits the use of 
virtually any branching rule for moving from an it em at 
one stage to one at the next . Branching in the stradap- 
tive test occurs between strata , therefore no pre -determined 
item branching network exists for the stradaptive t est . 
The simplest branching rule is an "up-one/down-one" pro-^ 
cedure. If a testee answers an item correctly, he is 
routed to an item at the next more difficult stratum; if 
he answers incorrectly he is routed to an item at the next 
easier stratum of difficulty. Other branching rules are 
also possible. For example , a correct response can lead to 
an item one stratum higher in difficulty, while an incorrect 
response can branch downward two strata. Such a rule might 
be adopted either where the opportunity for guessing may 
allow the testee to answer a number of items correctly 
solely by chance, or where it is desired to administer a 
very easy item (with a high probability of a correct answer 
for a given individual) following an incorrect response 
in order to prevent the testee from becoming discouraged. 



-16- 

Figure 3 

Stradaptive Test Entry Point Questions 



College Students 

In which category is your cumulative GPA to date? 

1. 3.76 to 4.00 

2. 3.51 to 3.75 
3- 3.26 to 3.50 

4. 3.01 to 3.25 

5. 2.76 to 3.00 

6. 2.51 to 2.75 
7 • 2 . 26 to 2 . 50 

8. 2.01 to 2.25 

9. 2.00 or less 

Enter the category (l through 9) and press the 
return key. 



Entry Stratum 
(not seen 
by student) 

9 

8 

7 

6 

5 

4 

3 

2 

1 



N on -Col lege Students 

Everybody is better at some things than others., 
Compared to other people, how good do you think 
your vocabulary is? 

Better than: 



Entry Stratum 
(not seen 
by testee) 



1 


out 


of 


10 


; ......1 


2 


out 


of 


10 


2 


3 


out 


of 


10 


3 


4 


out 


of 


10 


k 


5 


out 


of 


10 


5 


6 


out 


of 


10 


6 


7 


out 


of 


10 


7 


8 


out 


of 


10 


8 


9 


out 


of 


10 


. . 9 



Type in the number from 1 to 9 that gives the 
number of people you are better than (in 
vocabulary ) • 
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If it is desired to obtain a fairly quick estimate 
of the testee's "ceiling stratum" (i^e., the stratum at 
which he gets all items incorrect ) the tester might use 
different branching rules at different stages of testing. 
At the earlier stages of testing, he might use an "up- 
two/down -two" rule in order to more quickly arrive at a 
narrower range of strata in which the testee ' s ability is 
likely to fall. Then, after perhaps the tenth stage of 
testing (i.e. , ten items have been administered) , the 
tester might adopt an "up- one/ down- one" procedure which 
would concentrate item administration .within the narrower 
range of strata (e.g., 2 or 3) estimated to include the 
testee's actual ability level. 

The stradaptive test also allows for differential 
response option branching, as suggested by Bayroff (Bayroff , 
Thomas & Anderson, I960). In this procedure, incorrect 
response alternatives in a multiple choice (or, for that 
matter, a free-response) test are graded in terms of the 
extent to which they show partial knowledge. A correct 
response always leads to the same upward branching deci- . 
sion. When an item is answered incorrectly, the step size 
of the downward branch (i.e., the number of strata branched 
over) is a function of the "incorrectness" of the chosen 
distractor. For example, a "very wrong" answer (e.g., a 
response given only by testees of very low ability) might 
lead to a downward branch of three steps; a response which 
is closer to being correct might result in branching two 
strata downward, while choice of the most plausible in- 
correct answer would branch the testee only one stratum 
down in difficulty. Such differential response option 
branching should permit more rapid identification of an 
individual ' s actual ability level , leading to a reduction 
in the time needed for the assessment of a particular 
ability . 

For individuals whose abilities are at or near the 
highest or lowest stratum in the stradaptive item pool, 
there may be instances where items at higher or lower 
difficulty strata will not be available . In these cases , 
it will be necessary to administer successive items at 
the same stratum in place of the optimal items at higher 
or 1 ower strata . 

Termination . A unique feature of the stradaptive test 
is its individualized termination rule. In contrast to two- 
stage tests, all the pyramidal models, and the flexilevel 
test (see Weiss & Betz, 1973j research on these stra- 

tegies, and Weiss, 1973, for detailed descriptions of each), 
all of which administer a fixed and pre-det ermined number 
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of items to each individual testee, the stradaptive test 
permits the number of items administered to each testee 
to vary. While both Owea's (1969, 1970) Bayesian adaptive 
testing strategy and Urry ' s (iy70) maximum likelihood 
strategy do permit an individualized number of test items, 
both of these strategies require restrictive assumptions 
about the hypothesized shape of the underlying ability 
distribution, and necessitate sophisticated mathematical 
calculations which might be difficult or time-consuming 
to implement on some computer systems. The stradaptive 
test, while retaining the individualized number of items, 
makes no assumptions about the shape of the ability dis- 
tribution and requires no complex calculations. 

As indicated above, the stradaptive test can be con- 
ceived of as a search for the peaked tests most appropriate 
for an individual testee. These peaked tests, which pro- 
vice maximum information on a testee 's ability level, can 
be identified, after the fact, as tests on which the testee 
answered about 30% of the items correctly, if guessing is 
not a factor. A peaked test is inappropriate if the testee 
answers all items correctly or all items incorrectly. Thus, 
the objective of the stradaptive test is to locate the re- 
gion of the item pool in which measurement efficiency will 
be maximum for any individual. 

This objective can be realized by a simple account- 
ing procedure. Regardless of the branching rules used, 
the computer simply keeps track of l) the number of items 
administered at each stratum and 2) the number of items 
answered correctly at that stratum. After each item has 
been answered, the ratio of these two values, or the pro- 
portion correct at each stratum, is computed. Prior to 
administering the next item, the termination criterion is 
checked to determine whether it has been met. If the 
criterion has been met, testing is stopped and the indi- 
vidual's response record is scored. If not, an addi- 
tional item is selected using the branching rules pre- 
viously chosen for testing. That item is administered 
and scored, the proportion of items correct at each stratum 
is computed, and the termination criterion again checked. 
Testing continues until the terminatioix criterion is met. 

One logical criterion for terminating stradaptive 
testing involves identifying the lowest (i.e., easiest) 
stratum at which the individual is answering at a chance 
level. Thus, the stradaptive test can be viewed as a 
search for the testee 's "maximum" level of performance 
on that set of test items. In a multiple choice test the 
chance level is determined by l/c, where c is the number 
of response choices in each test item. Thus, for 5-alter- 
native multiple choice items, answering 1 (or zero) out of 
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5 items correctly at a given stratum would indicate chance 
responding. Using such a termination rule, then, t*>sting 
would continue until a stratum is identified at which the 
testee has responded at chance or below, provided that, 
say, five items have been administered at that stratum. 
The last condition is necessary to avoid the situation 
where a testee answers the first one or two items at a 
given stratum incorrectly, but would answer correctly 
well above chance levels if administered enough items at 
that stratum. Variations in the minimum number of items 
required at any stratum before the proportion correct is 
used to check the termination criterion will probably re- 
sult in stradaptive test scores with varying degrees of 
precision and stability. For example, requiring a larger 
number of items will probably result in fewer inappropriately 
early terminations, while decisions made on smaller numbers 
of items within a stratum might result in some artif actually 
early terminations after which further testing may have led 
to higher ability scores. 

Conceptually, then, the tester can control the degree 
of precision of the ability estimates derived from stra- 
daptive testing by manipulating the termination criterion 
in one of two ways. First, he can require that a larger 
number of items be administered at the ceiling stratum 
before the termination criterion is evaluated for an indi- 
vidual. Secondly, the tester can directly manipulate the 
confidence level of the termination decision. This can be 
accomplished by directly positing an hypothesis of a pro- 
portion of correct responses of, say, p = .20. The ob- 
tained proportion of correct responses (for any specified 
number of items) at a given stratum can then be tested 
against the hypothesized value by standard hypothesis test- 
procedures. This would involve either a binomial expansion 
given p, q and N (the number of items administered), or 
the computation of a confidence interval around the ob- 
tained proportion of correct responses using the same para- 
meters. The alpha value associated with the test of hypo- 
thesis, or the confidence level of the confidence interval, 
could be chosen in advance by the tester as a way of con- 
trolling the precision of the obtained ability estimate. 
Testing would then continue until the data at any stratum 
failed to reject the hypothesis of chance responding (e.g., 
p = .20), or until the computed confidence interval in- 
cluded the hypothesized chance value. As the number of 
test items at the termination stratum increased, the power 
of the statistical test would also increase, thereby likely 
increasing precision of measurement and such practical 
criteria as test-retest stability of the ability estimates. 
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The proposed termination rule is applicable to multi- 
ple choice test items with a constant number of response 
choices, to true false test items, and to free- response 
test items. For four-choice test items, the pseudo-chance 
level is .25> for seven-choico items it is l/? or .1^, and 
for true -false items it is .5^» For free -response items , 
the termination criterion becomes the lowest stratum at 
which the individual answers no items correctly. Thus, 
when guessing can be completely ruled out, the stradaptive 
test would continue as long as an individual gets any 
items correct at strata of increasing difficulty. This 
termination criterion is identical to Binet's "ceiling 
age . " 

Implementation of the "lowest chance stratum" termi- 
nation rule yields interesting results in actual stradap- 
tive testing with an "up-one/down-one" branching rule. 
In general, for the majority of individuals these proce- 
dures identify a "basal stratum", i.e., a stratum at 
which all items are answered correctly, and a "ceiling 
stratum", i.e., the least difficult stratum at which the 
testee responds at a chance level. In between these two 
limiting strata, the proportion corruct on each stratum 
will vary between 1.00 and the chance level (.20 or less) 
and will decrease fairly systematically from the basal 
to the ceiling stratum. This pattern is evident even 
when a relatively small number of items has been adminis- 
tered. Specific examples will be given below. 

For some individual testees, inconsistency in their 
response records will occasionally cause the stradaptive 
pool to exhaust the supply of test items at some stratum. 
Thus , for a variety of reasons (e.g., motivation, fatigue, 
inappropriateness of the item pool for that testee), some 
individuals will fail to reach a termination criterion at 
a given stratum before exhausting the item pool at that 
stratum. When this occurs, the branching procedure can 
be modified to eliminate downward branching but to continue 
upward branching. Thus, following a correct response the 
testee would be presented with an item at the next higher 
stratum, but following an incorrect response an item at 
the same stratum would be administered if the next lower 
stratum is exhausted. This procedure will lead to a very 
rapid identification of the testee's ceiling stratum, at 
the expense of the probable positively reinforcing value 
of alternating difficult and easier test items. 

Scoring 

Since the stradaptive test adapts item presentation 
to characteristics of the individual being tested, the 
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"number correct" score used almost universally for con- 
ventional tests is inappropriate. Number correct is 
inappropriate because the number of items administered 
to each individual will vary; some individuals reach ter- 
mination in 11 or 12 items, while others require JO or 
items to safisfy the termination critez^ion. It might be 
expected, therefore, that determinin{^ the proportion of 
items correct for any testee would be an appropriate 
method of scoring tlie stradaptive test. Computing the 
proportion correct would account for individual differ- 
ences in the number of items administered yet convey the 
same information as the number correct score. 

However, this reasoning fails to take ir o account 
the fact that in the stradaptive tost, item difficulties 
are tailored to the individual's ability level through 
the branching procedure. The end result ol the branching 
procedure is to identify a subset of items on which the 
individual obtains about jO% correct responses. in the 
later stages of stradaptive testing, when the testing 
procedure begins to converf^e on an individual's ability 
level, each time an item ii^ answered correctly t\io tcstoo 
receives a more difficult item (at the next higher stra- 
tum). Because that item is. likely to be too difficult 
for him, he will probably answer it incorrectly and will 
therefore receive an easier itemt Since he is likely to 
get that item correct, the pi^ocess will be repeated and 
the testee will approximately alternate between easier 
items and more difficult items until the termination cri- 
terion is reached. The proportion of items correct for an 
individual will, therefore, center around .50, with devia- 
tions from .50 due to inappropriate entry points, unusual 
testee-item pool interactions, guessing, or an item pool of 
inappropriate difficulty* Actual stradaptive testing re- 
sults for over 300 testees show that the large majority of 
proportions correct vary from .-lO to.60. 

Since the number correct scores and their derivatives 
are inappropriate for stradaptive tests, new methods of 
scoring must be developed. Some methods that might prove 
satisfactory are suggested by the available research on 
pyramidal adaptive testing models (see Weiss & Betz, 1973> 
p. 20-35)* Because of some similarities between the stra- 
daptive models and the pyramidal tests (Weiss, l'i73) some 
of these scoring methods can be applied to stradaptive test- 
ing. Other scoring methods are suggested by the logic of 
the stradaptive test itself, as it derives from Binet's 
approach to ability measurement. . 



Following are a number of ways stradaptive tests can 
be scored. Most scoring methods assume that normal ogive 
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difficulty parameters, or estimates thereof, have been 
computed for the items of the stradaptive tost so that 
item difficulty data are on the same latent scale as 
ability estimates; in this way, item difficulties can be 
used to e s timate the abili ty of persons correctly answer- 
ing subsets of items. In using these parameters it is 
assumed that the items in the stradaptive item pool measure 
a single unidimensional continuum, 

Highe s t item dif f i cu 1 ty score s . These scoring methods 
are borrowed from the pyramidal testing models (e.g., 
Paterson, 1962; Bayroff & Seeley , 1967; Lord, 1970). They 
are all based on the "hurdle" conception of ability measure 
ment; that is, the individual's ability level can be de- 
termined from the "height of the highest hurdle he can jump 
The difficulty of an item is equivalent to the height of 
the hurdle; answering an item correctly implies jumping 
the hurdle. There are three vai^iations of this score 
possible In the stradaptive test, with the third being 
unique to stradaptive testing: 

1. Ability can be scored as the difficulty of the 
most difficult item answered correctly. 

2. Since testing always terminates at an item at 
the ceiling stratum, ability can be measured 
as the difficulty of the "n + l"'" item, or the 
item that would have been administered next if 
testing had not terminated. Thus, the individual 
who answers his final (n*^*') item correctly would 
obtain a higher ability estimate than the testee 
who answers the n^ item incorrectly . 

3. An individual's ability score can bo conceived 
of as the difficulty of the most dif f icul t item 
answered correctly below the testee's ceiling 
stratum . 

A major weakness of bhese "highest item difficulty" 
scores is their probable unreliability, in terms of test- 
retest stability, if guessing is possible. Since in a 
multiple choice test it might be possible for a testee to 
obtain a correct answer above his true ability level solely 
by chance, the first two of these scoring methods would 
probably be unreliable. Method 2 would probably yield 
scores of somewhat lower reliability than method 1 since 
guessing would be more likely to occur on items at the 
testee *s ceiling stratum. Method 3 is suggested as an 
alternative unique to the stradaptive test when guessing 
is expected to operate; since method 3 attempts to minimize 
the effects of chance successes, its results should be more 
stable than those of methods 1 or 2. When guessing is not 
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possible, i.e., on f ree-resporlse items , methods 1 and 3 
will give similar results . Method 2 results wil I vary 
as a function of the adequacy of the termination rule. 

Stratum scores . As indicated above, the s tradi^pt ive 
item pool can be considered to be a series of peaked tests 
graded in difficulty. Associated with each peaked test 
is a difficulty level, which can be characterized by the 
average difficulty of all items at a given stratum. That 
average dif f cul ty level indicate s the point on the under- 
lying ability continuum at which each peaked test is peaked. 
It can, therefore, be used as an ability estimate for indi- 
viduals in several ways, following the logic of scoring 
methods 1 through 3J 

4. An individual's score is the difficulty level 
associated with the most difficult stratum at 
which he answered at least one item correctly. 

5. The stradaptive test score can be determined from 
the difficulty level of the stratum of the n+1^ 
item . 

6. Test score is the difficulty level of the stratum 
just below the testee's ceiling stratum, i.e., 
the difficulty of the highest non-chance stratum 
reached . 

These stratum scoring methods might result in somewhat more 
stable ability estimates than the "highest item" methods, 
since they would eliminate some of the variability due 
solely to variations in difficulties of specific items 
which would occur in methods 1 to 3. Irj using scoring 
methods k through 6, however, the number of possible scores 
will be equal only to the number of strata. Thus, when the 
number of strata is small, score variability will be severely 
decreased, leading to loss of information on individual 
differences and lowered correlations with other variables. 
The stratum scoring methods appear appropriate, therefore, 
only when the number of strata in the item pool is quite 
large (e.g., 25 or more). 

Scoring method 6 also does not convey information on 
the proportion of items correct at the stratum just bel ow 
the testee's ceiling stratum. At that highest non-chance 
stratum, one testee might answer 80^ of the items correctly, 
while another mignt answer only 25% of the items correctly; 
using scoring method 6, both of these testees would obtain 
the same score even though their ability levels are probably 
different. It seems appropriate, therefore, to define an 
additional method of scoring, the "interpolated stratum 
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difficulty score", which is designed to take account of 
the proportion correct data on individual testees at the 
highest non-chance stratum. 

7. The interpolated stratum difficulty score can 
be defined as : 

where D "^^ the average difficulty of the 

c-1^ stratum, where c is the ceiling 
stratum. It is, therefore, the average 
difficulty of all items available at 
the testee's highest n on -chance stratum, 
or the stratum just below his ceiling 
stratum . 

p 2. "^^ testee's proportion correct at 

the c-1*^ stratum. 

and S is - ^c-1' ^c-1 greater than .50f 

or D D ^ if p .is less than .50, 

c-1 c-2 • c-1 

where D is the average difficulty of the designated 
stratum . 

The interpolated stratum score assumes that the testee's 
ability lies at the mean of the difficulties of a peaked 
test (i.e., a stratum) if he answers exactly of the 

items on that test correctly. If he answers very few of 
the items correctly, for example 25%? his ability is 
below the mean of that peaked test, tending toward the mean 
of the items at the next 1 ower stratum . If the teste e 
answers 80^^ of the items at a stratum correctly, his 
ability is above the mean of the peaked test and close 
to the lower range of ability measured by the items at 
the next most difficult stratum. Essentially, then, this 
scoring method interpolates the testee's ability level as 
a function of the distance between the rel evan t mean diffi- 
culties of the strata and the proportion of items answered 
correctly. In implementing the computations, if the c^ 
or c-2^ strata do not exist (i.e., are above or below the 
difficulties available in the item pool) the average diffi- 
culty of those hypothetical strata can be determined by 
adding or subtracting the constant or increment in diffi- 
culty between strata to the last actual average stratum 
difficulty available. 

The interpolated stratum difficulty score, in addi- 
tion to having the desirabl e characteristic of taking 
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account of more of the information available from stra- 
daptive testing, has the added advantage of increasing 
the range of scores possible over that available from the 
other stratum scoring methods. 

Average difficulty score s . In an effort to compro- 
mise the probable unreliability of scoring methods 1-3 
and the restricted range of methods a number of 

average difficulty scores appear to be logically sound: 

8. An individual's score can be determined as the 

average difficulty of all items answered correct- 
ly . 

This method continues the "hurdle" analogy of ability scor- 
ing, but attempts to balance out chance factors by using 
an average, A major deficiency of this scoring method is 
that scores will be affected by inappropriate entry points. 
If the entry point is too low the testee will be presented 
with, and probably answer correctly, a number of items 
below his true ability level. His ability estimate will, 
therefore, be lower than it should be. An inappropriately 
high entry point will result in the administration of a 
number of items which are too difficult for a given testee. 
The administration of these difficult items might increase 
the probability of chance successes and thereby artifac- 
tually raise test scores based on this method of scoring* 

9« Ability can be scored as the average difficulty 
of all items correct between (but not including) 
the basal stratum (lOO^ correct) and the ceiling 
stratum ( chance responding) . 

Thus, the "routing items", those items resulting from too 
high or too low an entry point, will not be scored in this 
method. Therefore, this scoring method will eliminate the 
problems inherent in method 8, and will probably result 
in more stable ability estimates. In-order to use this 
method, however, the problem of individuals for whom a 
clear basal or ceiling stratum cannot be determined must 
be solved, 

10, The stradaptive test can be scored by determining 
the average difficulty of items answered correctly 
at the highest non-chance stratum. 

This method is the average difficulty analogue of method 3« 
It essentially identifies the peaked test of highest diffi- 
culty which is not inappropriate for a given testee, eli- 
minating those that are too difficult and those that are 
too easy. It should give ability estimates with good 
variability and fairly high stability* 
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The variety of scoring methods available suggests a 
number of interesting research possibilities using stra- 
daptive tests# Scoring methods may vary in terms of psy- 
chometric characteristics, such as stability, shape of 
resulting score distributions, or correlations with scores 
on other testing strategies. Scoring methods may also 
vary in terms of validity and/or utility, with some methods 
better predicting external criteria or being more useful 
in different kinds of situations. Only future research, 
using a variety of empirical , simulation , and theoretical 
studies will determine which scoring methods are best 
suited for particular purposes. 

Consistency of Ability Estimates 

The ten scoring methods described above, and others 
yet to be developed, all give "point estimates" of an 
individual's ability. Thus, they each return one value, 
based on some function of the difficulties of the items 
a testee has answered correctly, which indicates the point 
at which he falls on the underlying ability continuum. 
An analysis of the test records of individuals who have 
taken stradaptive tests shows additional information which 
reflects the consistency of tl:ie testee's response pattern. 
Such consistency data can be interpreted like data on the 
standard error of measurement; it indicates the range of 
confidence which can be attributed to a given ability 
point estimate. Individuals who are more consistent should 
have more stable ability estimates, while those who are 
less consistent should have less stable ability estimates. 
At present, this is only an hypothesis which will need 
empirical verification. 

On stradaptive tests, individual differences occur 
in the number of strata between the basal stratum and the 
ceiling stratum. Thus, it is possible for some indivi- 
duals to have the same score by one or more scoring methods 
(e.g., difficulty of the highest non-chance stratum), but 
the number of strata utilized in obtaining that score will 
differ widely. Some testees are consistent enough in their 
responses that their response records encompass only two or 
three strata. Other testees respond more inconsistently to 
the items, and their response records may encompass five or 
more strata between the basal and ceiling strata. Thus, 
the number of strata used by the testee can be a rough in- 
dex of the consistency of his ability estimate, if items 
resulting from inappropriate entry points are eliminated. 
A related index would be the difference in average diffi- 
culties between the ceiling and basal strata. 

A more meaningful consistency index might be the 
variance or standard deviation of the difficulties of 

ERIC 
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the items answered correctly between the testee^s basal 
and ceiling strata. This index would reflect more accu- 
rately the consistency of an individual's stradaptive test 
performance. It has the further advantage of being within 
the control of the tester. Since the vr.riance is a mean , 
adding more items at or near the mid-point of the distri- 
bution of correct responses will reduce the variance. 
Reduction of this variance consistency estimate will occur 
then, by administering additional items at an individual's 
estimated ability level; since these items will have little 
or no deviation from his ability, the variance will continue 
to reduce with add it ion al items . Testing could then con- 
tinue in this fashion until a desired "standard error of 
measurement" was reached. At the same time that the vari- 
ance reduction occurs by administering additional items, 
indicating greater confidence in the abilility estimate, 
the ability estimate itself should stabilize due to the 
greater number of items administered. 

Individuals differ also in the number of items necessary 
to reach a termination criterion. In over 350 stradaptive 
tests administered to college students, the median number 
of items required to reach .termination was 18; the shortest 
stradaptive test required only 9 items and the longest 
required l60 items. Individuals who required a larger 
number of items also utilized a larger number of strata. 
The number of items required for termination, therefore, 
is a rough indication of an individual's consistency of 
response. Only further research on the relationship of 
this additional individual differences variable with other 
consistency data and with other data external to the stra- 
daptive testing procedure will determine its utility. 

Illustrative Results from Stradaptive Testing 

The previous sections have described the essential 
characteristics of the stradaptive test. However, to 
understand the method more completely, it is helpful to 
see the results of its application with actual testees. 
The following figures are graphical illustrations of the 
response records of a number of college students who took 
stradaptive tests. 3 The 9-stratum item pool used consisted 
of 229 5 -response choice vocabulary items ; the s true ture 
of the item pool is shown in Table 1. Entry point infor- 
mation was the student's report of his/her GPA as shown 
in Figure 3. An "up-one/down-one" branching rule was used. 
Termination occurred when a stratum was identified at which 



The stradaptive test administration program was written 
by Robert S\visher; the display program was written by 
David Vale. 
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Figure k 
REP0RT 0N STRADAPTIVE TEST 

DATE TESTED! 73/07/12 



STRATUM! 



<EASY) 
1 2 



(DIFFICULT) 



PR0P.C0RR! 




1.00 1.00 
T0TAL PR0P0RTI0N C0RRECT« .550 
SC0RES 0N STRADAPTIVE TEST 

1. DIFFICULTY 0F M0ST DIFFICULT ITEM C0RRECT- 1.49 

2. DIFFICULTY 0F THE M*l TH ITEM- 1.44 

3. DIFFICULTY 0F HIGHEST N0N-CHANCE ITEM C0RRECT* 1.49 

4. DIFFICULTY 0F HIGHEST STRATUM 
WITH A CORRECT ANSWER- 1.33 

5. DIFFICULTY 0F THE M+1 TH STRATUM- 1.33 

6« DIFFICULTY 0F HIGHEST N0N-CHANCE STRATUM- 1.33 

7. INTERPOLATED STRATUM DIFFICULTY- 1.37 

8. MEAN DIFFICULTY 0F ALL C0RRECT ITEMS- .88 
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9. MEAN DIFFICULTY dt CORRECT ITEMS BETWEEN 
CEILING AND BASAL STRATA 

10. MEAN DIFFICULTY 0F ITEMS CORRECT 

AT HIGHEST N0N-CHANCE STRATUM- 1.28 



1.28 
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Table 2 



Number of items administered (n) and cumulative 
proportion correct (p) by stage, for William W. 



Stratum 

7 8 9 Total 



1 


1 1 . 00 










2 


1 


1 . 00 








3 






1 


1 .00 




k 










1 


5 






2 


1 .00 




6 










2 


7 






3 


1 .00 




8 










3 


9 






h 


.75 




10 


2 


1 .00 








11 






5 


.80 




12 










4 


13 






6 


.67 




ih 


3 


1 . 00 








15 






7 


.57 




16 


i^ 


1 .00 








17 






8 


.50 




18 


5 


1.00 








19 






9 


.56 




20 










5 



Stage Np-"N p N p Np N p N_E N_ 





1 


1 .()() 




2 


1 .00 




3 


1 . 00 


0.00 




.75 




5 


.80 


0.00 


6 


.67 




7 


.71 


0.00 


8 


.63 




9 


.56 




JO 


.60 




11 


.64 


0.00 


12 


.58 




13 


.54 






.57 




15 


.53 




16 


.56 




17 


.53 




18 


.56 




19 


.58 


0.00 


20 


. 55 
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the proportion of correct responses was .20 or less, based 
on a minimum of five items completed at that stratum. Test 
items were presented to the student on a cathode-ray- 
terminal (CRT) with responses recorded through the CRT 
typewriter keyboard , 

A typical response record . Figure k shows the stra- 
daptive test performance of "William W. a college sopho- 
more. This test record is typical of the stradaptive 
test performance of college students. William was first 
presented with an entry. point screen (Figure 3) and 
indicated that his cumulative grade point average to 
date was between 2.76 and 3.00. He thus began the stra- 
daptive test at stratum 5- His answer to the first item 
was correct (indicated by a in Figure k) , which 

branched him to the first available item in stratum 6. 
Correct answers to the second and third items resulted 
in his moving to stratum 8, where he received the first 
item from that more difficult peaked test. Since the 
stage k item was too difficult for him, his response was 
incorrect (-), and he branched downward to the first item 
in stratum 7« William then alternated between correct 
and incorrect responses for the items at stages 6 through 
8, followed by an incorrect response to the stage 9 item. 
This returned him to stratum 6 for his tenth item. With 
a few minor deviations, William then essentially alternated 
between correct and incorrect responses from stages 11 
through 20. Item 20 terminated the stradaptive test since 
the testing procedure had, at that point, located William's 
ceiling stratum; at stratum 8 William had answered all 5 
items incorrectly . 

Table 2 shows a complete "accounting" of William's 
stradaptive test performance. As the data in Table 2 
indicate, tentative estimates of William's "basal" and 
"ceiling" strata were evident by stage 10; at that point 
he had 100% of the items correct at stratum 6, 75% correct 
at stratum 7 and none correct at stratum 8; his total per- 
cent correct at stage 10 was 60%. However, these per- 
centages were based on only 2, ^, and 3 items respectively 
and therefore were not likely to be very stable. Since 
the termination criterion had not been met (i.e., 20^^ or 
less items correct based on 5 items administered at a 
stratum) the stradaptive test continued. As additional 
items were administered, William continued to answer all 
items at stratum 6 correctly , and at stratum 7 answered 
some items correctly and some incorrectly . By stage 19 , 
he had completed the first 9 items available at stratum 7 
and had answered 56% of those correctly. The final item 
administered (stage 20) was the fifth item at stratum 8, 
which he answered incorrect ly . 
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The last column of Table 2 shows the proportion 
correct at each stage of the stradaptive test. That 
proportion shows a steady step-like decrease from 100^ 
correct at stage 1 to 55^ correct at stage 20. It is 
typical of stradaptive test per-f'ormance for the propor- 
tion correct at the final stage to be near .50; in William* s 
test performance the proportion correct stayed between .50 
and .60 from stage 2 through termination. 

Figure k also shows stradaptive test scores for William, 
using the scoring methods described earlier. As might be 
expected, the "highest difficulty" scores produced the 
highest ability estimates, and methods 1 and 3 gave the 
same results since William answered no items correctly at 
or above his ceiling stratum. Methods k, 5 and 6 gave 
identical results for similar reasons; with a different 
set of test responses, however, these results would differ. 
The "average difficulty" methods gave the lowest ability 
estimates as a group, since the averages were lowered by 
the inclusion of the less difficult items. 

William * s stradaptive test performance (Figure k) is 
an example of a slightly low entry point. Because he 
entered at stratum 5» which was below his basal stratum 6, 
his response to the first item conveyed no information. 
However, it did serve to route him to the higher strata 
where testing was concentrated. Eliminating the first 
item administered from total proportion correct gives a 
proportion of .^5 correct for William at the termination 
of testing. 

Hipch entr^^ point . Occas ional ly an entry point is too 
high; an example is shown in Figure 5 for "Carol C. " Carol 
reported her GPA to be in category 3-01 to 3«25 (see 

Figure 3); this led to an entry at stratum 6. Her item 
responses quickly showed that the tests at strata 6, 5» ^1 
and 3 were too difficult for her. On the first six items 
Carol gave only one correct answer, an apparent "lucky 
guess" to a stratum k item. The routing procedure quickly 
brought Carol to strata 3, 2, and 1, which were composed 
of easier test items. Once she reached these strata her 
response pattern converged quickly on a region of the item 
pool in which she answered about 50^ of the items correctly. 
Although her total proportion correct was only •375» elimi- 
nating the routing items due to the erroneous entry point 
(items 1 through 5)» Carol obtained 5 correct answers out 
of 11 items in stages 6 through 10 , for an effective pro- 
portion correct of .45* Disregarding thefirst 5 routing 
items, Carol's stradaptive test performance is similar to 
that of William's. In both cases the stradaptive test 



NAMEt CAR0L C. 



Figure 5 
REP0RT 0N STRADAPTIVE TEST 

DATE TESTED: 73/07/12 



STRATUM I 



PR0P.C0RRX 



(EASY) 
1 2 



9- 
II*, 
13+ 



12- 



. ^>14- 



3+ 
.5- 



2- 
4- 



6 

-1- 



(DIFFICULT) 
7 8 9 



15 

. ^16- 

1.00 .80 0.00 .50 0.00 0.00 
T0TAL PR0P0RTI0N C0RRECT= .37 5 
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SC0RES 0N STRADAPTIVE TEST 

1. DIFFICULTY 0F M0ST DIFFICULT ITEM C0RRECT- -.70 

2. DIFFICULTY 0F THE 'n+ 1 TH ITEM= -1.92 

3. "DIFFICULTY 0F HIGHEST N0N-CHANCE ITEM C0RRECT= -1.68 

4. DIFFICULTY 0F HIGHEST STRATUM 
WITH A C0RRECT ANSWER" -.63 

5. DIFFICULTY 0F THE N+ 1 TH STRATUM" -1.92 

6. DIFFICULTY 0F HIGHEST N0N-CHANCE STRATUM" -1.92 

7. INTERP0LATED STRATUM DIFFICULTY" -1.73 

8. MEAN DIFFICULTY 0F ALL C0RRECT ITEMS" -1.81 
9« MEAN DIFFICULTY 0F C0RRECT ITEMS BETWEEN 



CEILING AND BASAL STRATA 

10. MEAN DIFFICULTY 0F ITEMS C0RRECT 

AT HIGHEST N0N-CHANCE STRATUM" -1.94 



s -1.94 
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Figure 6 
REPORT 0N STRADAPTIVE TEST 

DATE TESTED! 73/04/09 



STRATUM! 



<EASY) 
1 2 




(DIFFICULT) 
7 8 9 



PR0P.C0RRI 



1.00 .80 0.00 
T0TAL PR0P0RTI0N C0RRECT* 



,455 



SC0RES 0M STRADAPTIVE TEST 

1. DIFFICULTY 0F M0ST DIFFICULT ITEM C0RRECT» -.52 

2. DIFFICULTY 0F THE N-«-l TH ITEM" -.75 

3. DIFFICULTY 0F HIGHEST N0N-CHANCE ITEM C0RRECT- -.52 

4. DIFFICULTY 0F HIGHEST STRATUM 
WITH A C0RRECT ANSWER" -.63 

5. DIFFICULTY 0F THE M+1 TH STRATUM* -.63 

6. DIFFICULTY 0F HIGHEST N0N-CHANCE STRATUM- -.63 

7. INTERPOLATED STRATUM DIFFICULTY" -.44 

8. MEAN DIFFICULTY 0F ALL C0RRECT lTEMS= -.81 

9. MEAN DIFFICULTY 0F CORRECT ITEMS BETWEEN 
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CEILING AND BASAL STRATA 

10* MEAN DIFFICULTY 0F ITEMS CORRECT 

AT HIGHEST NON-CHANCE STRATUM" -.63 



= -.63 



identified a ceiling stratum (none correct or chance re- 
sponding ) a basal stratum (all correc t ) , and a peaked 
test in between on which the tes tee obtained an inter- 
mediate proportion correct. In Carol's case the optimal 
peaked test was at stratum 2, on which she obtained 80^ 
correct responses, while William's optimal peaked test 
was at stratum 7, on which he obtained 56^ correct re- 
sponses. It is interesting to note that William's entry 
point was lower than Carol's, yet their terminal ability 
levels were quite the reverse. 

Rapid convergence . When the entry podnt (^i^timate is 
accurate, the stradaptive test recoid can be quite short. 
Figure 6 shows an actual test record for "John J .* " . John 
entered at stratum 5 and immediately began alternating 
between correct and incorrect responses through stage 8. 
An incorrect response at stage 8 led to the identification 
of the basal stratum (although based on only one item) at 
stratum 3. Finally, an incorrect response on thu stage 11 
item permitted John to reach the termination criterion In 
only 11 items, having identified stratum 5 as John's ceil- 
ing stratum. John's ability level lies in the vicdnlty of 
stratum h at which he answered 80^ of the Items correctly. 
Over all 11 items administered, John answered 5» or a 
proportion of .455> correctly. 

Item pool too easy . Occasionally the stradaptive item 
pool is too easy, or too difficult, for a testeo. Figure 7 
shows the stradaptive test performance of "Nancy N.". 
Nancy entered at stratum 8, based on a GPA estimate in the 
range ^f 3-51 to 3-75, almost an A average. With the ex- 
ception of the stage 6 item , at stratum 7 > tr ^ t inf^ of 
Nancy was confined to the difficult peaked tests at strata 
8 and 9. Seventeen items were administered to Nancy , with 
10 of them at stratui-p 9, the stratum with the most diffi- 
cult items in the stradaptive item pool. Since stratum 0 
contained only 10 items, testing was terminated . It is 
obvious that further testing of Nancy would be unproductive 
even if additional items were available at s tra tijm 9 . 
Nancy answered 83^ of the items correctly at stratum 8, 
and 60^ correctly at stratum 9. Since it would be quite 
unJ.ikely that stratum 9 could be her ceiling stratum ( . 20 
or less correct ) , no purpose would be served by further 
testing. In this case, the stradaptive test simply indicates 
that Nancy's ability is very hj.gh , but it is unable to give 
an estimate of exactly how high it is since she is apparently 
"off the top" of the most difficult test in the stradap- 
tive pool. However, her ability is probably not as liip;h 
as the individual who would answer all items correc tiy at 
stratum 9« The latter individual would answer 100^ of the 



NAMEt NANCY N< 



Fj/;urt' 7 
REP0RT 0N STRADAPTIVE TEST 

DATE TESTEDl 73/04/09 



STRATUM! 



PR0P.C0RRt 



(EASY) 
1 2 



(DIFFICULT) 
8 9 



2- 
4- 



6*' 



5-* 
7+. 



-10+ 



• ll u+ 
. Jll- 

14+^ . 



1. 00 .83 



3- 
5+ 



,16+ 
M7+ 



.60 



T0TAL PR0P0RTI0N C0PRECTa 



706 
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SC0RES 0N STRADAPTIVE TEST 

1. DIFFICULTY 0F M0ST DIFFICULT ITEM C0RRECT= 3.11 

2. DIFFICULTY 0F THE N+I TH ITEM" I 

3. DIFFICULTY 0F HIGHEST N0N-CHANCE ITEM C0RRECT= 3.11 

4. DIFFICULTY 0F HIGHEST STRATUM 
WITH A C0RRECT ANSWER- 2.62 

5. DIFFICULTY 0F THE N+ 1 TH STRATUM" 3.27 

6. DIFFICULTY 0F HIGHEST N0N-CHANCE STRATUM* 2.62 

7. INTERP0LATED STRATUM DIFFICULTY- 2.69 

8. MEAN DIFFICULTY 0F ALL C0RRECT ITEMS- 2.24 

9. MEAN DIFFICULTY 0F C0RRECT ITEMS BETWEEN 
CEILING AND BASAL STRATA = 2.35 

to. MEAN DIFFICULTY 0F ITEMS C0RRECT 

AT HIGHEST N0N-CHANCE STRATUM- 2.63 
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items correctly, while Nancy answered only 60^ correctly. 
Thus, the total proportion correct can be a rou^h indica- 
tor of the appropriateness of the stradaptive item pool 
for an individual. When that proportion, corrected for 
routing, is between .40 and .60, it indicates a test 
record appropriately adapted to the individual's ability 
level. 

Two problems arose in computing scores for Nancy's 
stradaptive test performance. Scoring method 2, which 
determines score on the basis of the difficulty of the 
n+1^ item could not be implemented for Nancy. Since she 
answered her last item correctly and it was the last item 
at stratum 9, the next item to be administered would have 
been an item at stratum 10. There were, however, only y 
strata in the stradaptive item pool. Thus, the difficulty 
of the n+1^ item is indeterminate in Nancy's case, and an 
"I" is given on the computer report. A similar problem 
arose in computing the interpolated stratum difficulty 
score (method 7). Since Nancy answered 60*^ of the items 
correctly at stratum 9, her ability could be estimated to 
be above the mean difficulty of the stratum 9 peaked test 
(z=2.62, based on .50 correct). To compute the inter- 
polated stratum difficulty score, the Increment between 
the strata in the item pool, approximately .655, was 
added to the mean difficulty of stratum 9; Nancy's score 
was then interpolated into the interval between 2.62 and 
3,27 by the formula given earlier. 

Consistent vs. inconsistent response records . As 
indicated above , stradaptive test records can reflect 
individual differences in consistency of test performance. 
Figures 8 and 9 contrast the test records of "Tom T." 
and "Dixie D". In both cases entry into the item pool 
was at about the same level of difficulty; Tom entered at 
stratum 6 while Dixie began at stratum 7. For the first 
8 items, both Tom and Dixie alternated between items at 
strata 6 and 7> and both had moved to the easier items at 
stratum 5 by the 10th stage of testing. After two items 
at stratum 5> Tom recovered quickly to stratum 6 and reached 
the termination criterion after l4 items. Tom's basal stra- 
tum was stratum 5> and stratum 7 was his ceiling stratum. 
His highest non-chance stratum was stratum 6, at which he 
answered 71% of the items correctly. 

Dixie's i ^st performance, although similar to Tom's 
in the earlier stages of testing, diverged sharply after 
the twelfth item. At that point she began to answer easier 
items incorrectly, finally being presented with an item 
from stratum 3 at the 17^ stage of testing. Dixie's response 
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Figure 8 
REP0RT 0N STRADAPTIVE TEST 

DATE TESTED: 73/07/02 



STRATUM: 



PR0P.C0RR: 



(EASY) 
1 2 




1.00 .71 

T0TAL PR0P0RT10N C0RRECT3 .571 



(DIFFICULT) 
7 8 9 
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SC0RES 0N STRADAPTIVE TEST 

1. DIFFICULTY 0F M0ST DIFFICULT ITEM C0RRECT= 1.11 

2. DIFFICULTY 0F THE N+1 TH ITEM=» 1.89 

3. DIFFICULTY 0F HIGHEST N0N-CHANCE ITEM C0RRECT* 

4. DIFFICULTY 0F HIGHEST STRATUM 
WITH A C0RRECT ANSWER- 1.33 

5. DIFFICULTY 0F THE N+1 TH STRATUM' 2.01 

6. DIFFICULTY 0F HIGHEST N0N-CHANCE STRATUM* .65 

7. INTERP0LATED STRATUM DIFFICULTY- .80 

8. MEAN DIFFICULTY 0F ALL G0RRECT ITEMS=» .52 



9. MEAN DIFFICULTY 0F C0RRECT ITEMS BETWEEN 
CEILING AND BASAL STRATA 

10. MEAN DIFFICULTY 0F ITEMS C0RRECT 

AT HIGHEST N0N-CHANCE STRATUMa .59 



,79 



59 
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REP0RT 0N STRADAPTIVE TEST 
NAKEl DIXIE D. DATE TESTEDl 73/0^/09 



(EASY) (DIFFICULT) 
STRATOMI I 23456789 




PR0P.C0RHt 1.00 mtA .33 0«00 

T0TAL PR0P0HTI0N C0RRECT- .489 
SC0RES BN STRADAPTIVE TEST 

Im DIFFICULTY 0F M0ST DIFFICULT ITEM C0RRECTi» .73 
a. DIFFICULTY 0F THE N*l TH ITEM- .78 

3* DIFFICULTY 0F HIGHEST N0N-CHANCE ITEM G0RR£CT« .73 

4. DIFFICULTY 0F HIGHEST STRATUM 
VITH A C0RRECT ANSWER- .65 

5. DIFFICULTY 0F THE M*t TH STRATUM* .65 

6. DIFFICULTY 0F HIGHEST NflN-CHANCE STRATUM- .65 

7. lNTEnP0LATED STRATUM DIFFICULTY- .54 

8. MEAT) DIFFICULTY BF ALL C0nRECT ITEMS- -.30 

9. MEAN DIFFICULTY 0F C0HnECT ITEMS BETVEEM 
CEILING AND BASAL STRATA - -.09 

ID# MEAN DIFFICULTY (Hr ITEMS CenRZCT 

AT KICHE5T NCN-CHAMCE STRATt'ri- ,59 
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record then shows a series of wide swings between items 
at stratum 3 and those at stratum 6. While many testees 
converge on strata that are contiguous, Dixie's responses 
se^^m to show a convergence somewhere between strata 3 and 
6. Thus, ability estimates derived from Dixie's stradaptive 
testing- are likely to be less precise than those from Tom's 
responses. Dixie finally worked her way back up to stra- 
tum 7 after 4? items to satisfy the termination criterion. 

Dixie ' s testing thus used five of the available nine 
strata, while Tom used only three. For both Tom and Dixie 
the ceiling stratum was stratum 7> t)ut while Tom's basal 
ability was at stratum 5> Dixie's was at stratum 3, Stra- 
tum 6 was the highest non-chance stratum for both, but 
Tom's ability is probably closer to that of stratum 7 
than to stratum 5> since he answered 71^ of the items 
correctly at stratum 6. Dixie's, however, is more toward 
stratum 5> since she answered only 33^ correctly at stra- 
tum 6. The difference is reflected by the interpolated 
stratum difficulty scores of .80 and .5^ for the two testees, 
respectively. These two response records show how stra- 
daptive test performance can differ in terms of both number 
of items administered and the number of strata used for 
ability determination . 

Another example of inconsistent stradaptive test per- 
formance is shown in Figure 10. This test record, for 
"Carl C.", shows a range of fluctuation even wider than 
that of Dixie D. (Figure 9)» Carl seemed to answer almost 
optimally (i.e., about 50^ correct) on the three peaked 
tests of strata 5, 6, and 7» His performance fluctuated 
rather consistently from strata 4 through 8, and he even 
attempted one item (27) at stratum 9, following a probable 
lucky guess at stratum 8. Carl's basal stratum was stra- 
tum k{lOO% correct) and his ceiling stratum was stratum 
8 (20^ correct). Between these two he answered slightly 
more than 50% of the items correctly, with an overall pro- 
portion correct of .54. Carl's inconsistent performance 
on the stradaptive test stands in sharp contrast to that 
of, say, John J. (Figure 6), whose very consistent response 
record covered only three strata, and who reached the ter- 
mination criterion in only 11 items. The utility of this 
information on individual differences in consistency of per- 
formance on the stradaptive test will be determined only 
through further research. Logically, however , it seems that 
such information could be used to derive individualized 
"standard errors of measurement." 

Implications of Proportion Correct Data 

The data in Figures 4 through 10 illustrate an inter- 
esting characteristic of stradaptive test records. For 
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Figure 10 
REP0RT 0M STRADAPTIVE TEST 

DATE TESTED! 73/07/12 



STRATUM t 



CEASY) 
1 2 



(DIFFICULT) 
7 8 9 
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PR0P.C0RRI 1,00 .57 .50 

T0TAL PR0P0RTI0N C0RRECT« .536 
SC0RES 0N STRADAPTIVE TEST 

%• DIFFICULTY 0F M0ST DIFFICULT ITEM CORRECT^ 2«31 

2. DIFFICULTY 0F THE N*i TH ITEM- UI7 

3. DIFFICULTY 0F HIGHEST N0N-CHANCE ITEM C0RRECT* I.A9 

4. DIFFICULTY 0F HIGHEST STRATUM 
VITH A C0RHECT ANSWER' 2.01 

5. DIFFICULTY 0F THE N+l TH STRATUM* 1.33 

6. DIFFICULTY 0F HIGHEST N0M-CHANCE STRATUM" 1.33 

7. INTERP0LATED STRATUM DIFFICULTY" ImHH 

8* MEAN DIFFICULTY 0F ALL C0RRECT ITEMS" ,47 

9* MEAN DIFFICULTY 0F CSRRECT ITEMS BETUESN 

CEILING AND BASAL STRATA " .60 

10. MEAN DIFFICULTY SF ITEMS CORRECT 

AT HIGHEST N0N-CHANCE STRATUM lo27 
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most indivj.duals completing a stradaptive test, the pro- 
portion of correct responses at the various strata decreases 
as the difficulty of the strata increases. These results 
are summarized in Figure 11, which plots the proportion of 
correct responses at each stratum. With the exception of 
the plots for Carl C. and Carol C. , these plots resemble 
item trace lines (Lord <& Novick, 196S)« The steepness of 
the slope can be interpreted as an index of the consis- 
tency of responses of the individual and the capability of 
the item pool to "discriminate" that individual's ability 
level. The point of inflection of the curve (i.e,, the 
point on the horizontal axis at which the testee answers 
50^ of the items correctly) could be interpreted as the 
"difficulty" of the item pool for the individual, or his 
position on the latent ability continuum. 

Roasoning analogically from item character is t: ic curve 
theory, non-regular item characteristic curves, such as 
those for Carl C. and Carol C, might indicate item pool- 
testee interactions which are inappropriate. Thus, both 
Carol and Carl might not be interacting with the j tem pool 
on a unidimen sional continuum. In order to ^et a more 
accurate ability estimate for such teytees, it mi^'ht bo 
necessary to multidimensionaily scale their response patterns 
to obtain subsets of test items (if possible) on whd.ch they 
responded in unidimensional fashion, as indicated by their 
test response "trace lines." Thus, Carl and Carol's rosponsf^ 
records might be analyzed by appropriate scaling methods to 
find the in tra-indiv idu a 1 probabilistic Guttman-type scales 
underlying their response patterns. 

The "trace line" plots for John J.^-'Tom T. and William 
W. approximate the classic step function Gu t tman- <:y po trace 
line. Dixie D.*s trace line plot is vei'y similar to the 
normal ogive probabilistic analogue of the Guttman trace 
line. Future research based on stradaptive tests with a 
large number of strata may lead to mathemat ization of these 
trace line ideas, which in turn may lead to greater utility 
for this type of test data. 

It is interesting to note that the stradaptive test 
performance of many testees results in a Guttman-like 
scaling of the testee 's performance with respect to the 
item pool. Since the stradaptive test developed from the 
testing rationale originally proposed by Binet, it follov\^s 
that perhaps Binet 's ability testing logic had embedded in 
it an unart iculated primitive version of Guttman 's ideas 
and the present-day derivates of modern test theory as de- 
rived from latent trait theory. 
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Ability -2.65 -1.92 -1.29 -.63 .00 .65 1.33 2.01 2.62 Ability 

I 2 3 4 5 6 7 8 9 
Easy STRATUM Difficult 

Figure il. Proportion correct at each stratum, 
by individual 
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Conclusion 's 

The stradaptive test is an operational computer -based 
testing model which draw-e simultaneously from Binet's 
pioneering work in ability measurement and from ideas in 
modern test theory. The testing procedure makes no re- 
strictive .assumptions about the nature of underlying 
ability distributions (beyond those involved in norming 
the item pool), and its implementation does not require 
complicated mathematical calculations. The procedure is 
also flexible with respect to size and composition of the 
item pool, branching rules, termination rules, and scoring 
methods. Data derived from the stradaptive test response 
record, including number of items completed, range of 
difficulties used, patterns of movement through the item 
pool, and various other methods of measuring a testee's 
interaction with a specified item pool appear to have 
promise as new sources of information derivable from 
ability testing. 

The availability of the stradaptive testing strategy 
poses many new research questions. Among these are the 
optimal characteristics (e,g., size, number of strata) of 
the stradaptive item pool, methods of selecting and pla- 
. c±ig items in the pool, variations in branching rules, 
applications of stochastic models to the branching process, 
variations in step size, effects of various termination 
rules, the reliability and utility of the various scoring 
methods proposed and those yet to be developed, methods of 
expressing an individual's consistency or the accuracy of 
test scores, methods of controlling the accuracy of test 
scores within the stradaptive framework, and relationships 
of stradaptive scores and ability estimates to tiiose derived 
from other adaptive strategies. These "research questions 
should be studied by a variety of approaches, including live 
testing empirical studies, simulation studies, and theoretical 
studies, with the results of each approach supporting and 
nourishing research using the other approaches. 
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